Identifying Semantic Divergences in Parallel Text without Annotations
نویسندگان
چکیده
Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.
منابع مشابه
Cross-lingual Word Sense Disambiguation for Predicate Labelling of French
We address the problem of transferring semantic annotations, more specifically predicate labellings, from one language to another using parallel corpora. Previous work has transferred these annotations directly at the token level, leading to low recall. We present a global approach to annotation transfer that aggregates information across the whole parallel corpus. We show that this global meth...
متن کاملA New Life for Semantic Annotations?
Semantic annotation has so far been approached in essentially the same way as annotation at other levels of linguistic information, namely as the business of labeling text with certain tags which add certain information to the text, in this case, semantic information. Semantic role labeling is a case in point. This may be very useful, for instance for determining the variety of ways in which ce...
متن کاملGlobal Methods for Cross-lingual Semantic Role and Predicate Labelling
We address the problem of transferring semantic annotations to new languages using parallel corpora. Previous work has transferred these annotations on a token-to-token basis, an approach that is sensitive to alignment errors and translation shifts. We present a global approach to transfer that aggregates information across the whole parallel corpus and leads to more robust labellers. We build ...
متن کاملActive Learning with Multiple Annotations for Comparable Data Classification Task
Supervised learning algorithms for identifying comparable sentence pairs from a dominantly non-parallel corpora require resources for computing feature functions as well as training the classifier. In this paper we propose active learning techniques for addressing the problem of building comparable data for low-resource languages. In particular we propose strategies to elicit two kinds of annot...
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کامل